Inspired by the impressive success of contrastive learning (CL), a variety of graph augmentation strategies have been employed to learn node representations in a self-supervised manner. Existing methods construct the contrastive samples by adding perturbations to the graph structure or node attributes. Although impressive results are achieved, it is rather blind to the wealth of prior information assumed: with the increase of the perturbation degree applied on the original graph, 1) the similarity between the original graph and the generated augmented graph gradually decreases; 2) the discrimination between all nodes within each augmented view gradually increases. In this paper, we argue that both such prior information can be incorporated (differently) into the contrastive learning paradigm following our general ranking framework. In particular, we first interpret CL as a special case of learning to rank (L2R), which inspires us to leverage the ranking order among positive augmented views. Meanwhile, we introduce a self-ranking paradigm to ensure that the discriminative information among different nodes can be maintained and also be less altered to the perturbations of different degrees. Experiment results on various benchmark datasets verify the effectiveness of our algorithm compared with the supervised and unsupervised models.
translated by 谷歌翻译
Artificial intelligence is to teach machines to take actions like humans. To achieve intelligent teaching, the machine learning community becomes to think about a promising topic named machine teaching where the teacher is to design the optimal (usually minimal) teaching set given a target model and a specific learner. However, previous works usually require numerous teaching examples along with large iterations to guide learners to converge, which is costly. In this paper, we consider a more intelligent teaching paradigm named one-shot machine teaching which costs fewer examples to converge faster. Different from typical teaching, this advanced paradigm establishes a tractable mapping from the teaching set to the model parameter. Theoretically, we prove that this mapping is surjective, which serves to an existence guarantee of the optimal teaching set. Then, relying on the surjective mapping from the teaching set to the parameter, we develop a design strategy of the optimal teaching set under appropriate settings, of which two popular efficiency metrics, teaching dimension and iterative teaching dimension are one. Extensive experiments verify the efficiency of our strategy and further demonstrate the intelligence of this new teaching paradigm.
translated by 谷歌翻译
Recent graph-based models for joint multiple intent detection and slot filling have obtained promising results through modeling the guidance from the prediction of intents to the decoding of slot filling. However, existing methods (1) only model the \textit{unidirectional guidance} from intent to slot; (2) adopt \textit{homogeneous graphs} to model the interactions between the slot semantics nodes and intent label nodes, which limit the performance. In this paper, we propose a novel model termed Co-guiding Net, which implements a two-stage framework achieving the \textit{mutual guidances} between the two tasks. In the first stage, the initial estimated labels of both tasks are produced, and then they are leveraged in the second stage to model the mutual guidances. Specifically, we propose two \textit{heterogeneous graph attention networks} working on the proposed two \textit{heterogeneous semantics-label graphs}, which effectively represent the relations among the semantics nodes and label nodes. Experiment results show that our model outperforms existing models by a large margin, obtaining a relative improvement of 19.3\% over the previous best model on MixATIS dataset in overall accuracy.
translated by 谷歌翻译
Recent joint multiple intent detection and slot filling models employ label embeddings to achieve the semantics-label interactions. However, they treat all labels and label embeddings as uncorrelated individuals, ignoring the dependencies among them. Besides, they conduct the decoding for the two tasks independently, without leveraging the correlations between them. Therefore, in this paper, we first construct a Heterogeneous Label Graph (HLG) containing two kinds of topologies: (1) statistical dependencies based on labels' co-occurrence patterns and hierarchies in slot labels; (2) rich relations among the label nodes. Then we propose a novel model termed ReLa-Net. It can capture beneficial correlations among the labels from HLG. The label correlations are leveraged to enhance semantic-label interactions. Moreover, we also propose the label-aware inter-dependent decoding mechanism to further exploit the label correlations for decoding. Experiment results show that our ReLa-Net significantly outperforms previous models. Remarkably, ReLa-Net surpasses the previous best model by over 20\% in terms of overall accuracy on MixATIS dataset.
translated by 谷歌翻译
大数据学习为人工智能(AI)带来了成功,但是注释和培训成本很昂贵。将来,对小数据的学习是AI的最终目的之一,它要求机器识别依靠小数据作为人类的目标和场景。一系列的机器学习模型正在进行这种方式,例如积极学习,几乎没有学习,深度聚类。但是,其概括性能几乎没有理论保证。此外,它们的大多数设置都是被动的,也就是说,标签分布由一个指定的采样方案明确控制。这项调查遵循PAC(可能是近似正确)框架下的不可知论活动采样,以分析使用有监督和无监督的时尚对小数据学习的概括误差和标签复杂性。通过这些理论分析,我们从两个几何学角度对小数据学习模型进行了分类:欧几里得和非欧几里得(双曲线)平均表示,在此还提供了优化解决方案和讨论。稍后,然后总结了一些可能从小型数据学习中受益的潜在学习方案,还分析了它们的潜在学习方案。最后,还调查了一些具有挑战性的应用程序,例如计算机视觉,自然语言处理可能会受益于小型数据学习。
translated by 谷歌翻译
主动学习最大化假设更新,以找到那些所需的未标记数据。一个固有的假设是,这种学习方式可以将这些更新得出到最佳假设中。但是,如果这些增量更新是负面和无序的,则可能无法很好地保证其收敛性。在本文中,我们介绍了一位机器老师,该教师为主动学习者提供了一个黑盒教学假设,其中教学假设是最佳假设的有效近似。从理论上讲,我们证明,在这一教学假设的指导下,学习者可以比那些没有从老师那里获得任何指导的受过教育的学习者融合到更严格的概括错误和标签复杂性。我们进一步考虑了两种教学方案:教授白盒和黑盒学习者,首先提出了教学的自我完善以改善教学表现。实验验证了这一想法并表现出比基本的积极学习策略(例如Iwal,Iwal-D等)更好的性能。
translated by 谷歌翻译
如今,对大规模数据的深入学习是主导的。空前的数据规模可以说是深度学习成功的最重要的驱动力之一。但是,仍然存在收集数据或标签可能非常昂贵的场景,例如医学成像和机器人技术。为了填补这一空白,本文考虑了使用少量代表性数据从头开始研究的问题。首先,我们通过在球形歧管的同构管上积极学习来表征这个问题。这自然会产生可行的假设类别。使用同源拓扑特性,我们确定了一个重要的联系 - 发现管歧管等同于最大程度地减少物理几何形状中的超球能(MHE)。受此连接的启发,我们提出了一种基于MHE的主动学习(MHEAL)算法,并为MHEAL提供了全面的理论保证,涵盖了收敛和概括分析。最后,我们证明了MHEAL在数据效率学习的广泛应用中的经验表现,包括深度聚类,分布匹配,版本空间采样和深度积极学习。
translated by 谷歌翻译
深度神经网络(DNNS)最近在许多分类任务中取得了巨大的成功。不幸的是,它们容易受到对抗性攻击的影响,这些攻击会产生对抗性示例,这些示例具有很小的扰动,以欺骗DNN模型,尤其是在模型共享方案中。事实证明,对抗性训练是最有效的策略,它将对抗性示例注入模型训练中,以提高DNN模型的稳健性,以对对抗性攻击。但是,基于现有的对抗性示例的对抗训练无法很好地推广到标准,不受干扰的测试数据。为了在标准准确性和对抗性鲁棒性之间取得更好的权衡,我们提出了一个新型的对抗训练框架,称为潜在边界引导的对抗训练(梯子),该训练(梯子)在潜在的边界引导的对抗性示例上对对手进行对手训练DNN模型。与大多数在输入空间中生成对抗示例的现有方法相反,梯子通过增加对潜在特征的扰动而产生了无数的高质量对抗示例。扰动是沿SVM构建的具有注意机制的决策边界的正常情况进行的。我们从边界场的角度和可视化视图分析了生成的边界引导的对抗示例的优点。与Vanilla DNN和竞争性底线相比,对MNIST,SVHN,CELEBA和CIFAR-10的广泛实验和详细分析验证了梯子在标准准确性和对抗性鲁棒性之间取得更好的权衡方面的有效性。
translated by 谷歌翻译
在方面情绪分类(ASC)中,最先进的模型编码语法图形或关系图以捕获本地语法信息或全局关系信息。尽管语法和关系图的优点,但它们具有忽略的缺点,这限制了图形建模过程中的表示功率。为了解决他们的局限性,我们设计了一种新的本地 - 全局交互图,它通过互动边缘缝合两个图来结合它们的优势。为了模拟本地全局交互图形,我们提出了一个新的神经网络被称为Dignet,其核心模块是执行两个进程的堆叠本地 - 全局交互(LGI)层:图中媒体消息传递和跨图形消息传递。通过这种方式,可以在理解方面的情绪方面整体和解局部句法和全局关系信息。具体而言,我们设计了具有不同种类的交互边缘和LGI层的三种变体的局部全局交互图的两种变体。我们对几个公共基准数据集进行实验,结果表明,在LAP14,Res14和Res15数据集的宏F1方面,我们以前的3 \%,2.32 \%和6.33 \%以3 \%,2.32 \%和6.33 \%。拟议的本地 - 全球互动图和赤霞珠的效力与优越性。
translated by 谷歌翻译
最小化未标记数据的预测不确定性是在半监督学习(SSL)中实现良好性能的关键因素。预测不确定性通常表示为由输出空间中的转换概率计算的\ emph {熵}。大多数现有工程通过接受确定类(具有最大概率)作为真实标签或抑制微妙预测(具有较小概率)来蒸馏低熵预测。无论如何,这些蒸馏策略通常是模型培训的启发式和更少的信息。从这种辨别中,本文提出了一个名为自适应锐化(\ ADS)的双机制,首先将软阈值应用于自适应掩盖确定和可忽略不计的预测,然后无缝地锐化通知的预测,与通知的预测蒸馏出某些预测只要。更重要的是,我们通过与各种蒸馏策略进行比较理论上,从理论上分析\广告的特征。许多实验验证\广告通过使其显着提高了最先进的SSL方法。我们提出的\ ADS为未来蒸馏的SSL研究造成一个基石。
translated by 谷歌翻译